Hedge Scope Detection in Biomedical Texts: An Effective Dependency-Based Method
نویسندگان
چکیده
Hedge detection is used to distinguish uncertain information from facts, which is of essential importance in biomedical information extraction. The task of hedge detection is often divided into two subtasks: detecting uncertain cues and their linguistic scope. Hedge scope is a sequence of tokens including the hedge cue in a sentence. Previous hedge scope detection methods usually take all tokens in a sentence as candidate boundaries, which inevitably generate a large number of negatives for classifiers. The imbalanced instances seriously mislead classifiers and result in lower performance. This paper proposes a dependency-based candidate boundary selection method (DCBS), which selects the most likely tokens as candidate boundaries and removes the exceptional tokens which have less potential to improve the performance based on dependency tree. In addition, we employ the composite kernel to integrate lexical and syntactic information and demonstrate the effectiveness of structured syntactic features for hedge scope detection. Experiments on the CoNLL-2010 Shared Task corpus show that our method achieves 71.92% F1-score on the golden standard cues, which is 4.11% higher than the system without using DCBS. Although the candidate boundary selection method is only evaluated on hedge scope detection here, it can be popularized to other kinds of scope learning tasks.
منابع مشابه
Exploiting Multi-Features to Detect Hedges and their Scope in Biomedical Texts
In this paper, we present a machine learning approach that detects hedge cues and their scope in biomedical texts. Identifying hedged information in texts is a kind of semantic filtering of texts and it is important since it could extract speculative information from factual information. In order to deal with the semantic analysis problem, various evidential features are proposed and integrated...
متن کاملLearning the Scope of Hedge Cues in Biomedical Texts
Identifying hedged information in biomedical literature is an important subtask in information extraction because it would be misleading to extract speculative information as factual information. In this paper we present a machine learning system that finds the scope of hedge cues in biomedical texts. The system is based on a similar system that finds the scope of negation cues. We show that th...
متن کاملExtraction of Drug-Drug Interaction from Literature through Detecting Linguistic-based Negation and Clause Dependency
Extracting biomedical relations such as drug-drug interaction (DDI) from text is an important task in biomedical NLP. Due to the large number of complex sentences in biomedical literature, researchers have employed some sentence simplification techniques to improve the performance of the relation extraction methods. However, due to difficulty of the task, there is no noteworthy improvement in t...
متن کاملChinese Hedge Scope Detection Based on Structure and Semantic Information
Hedge detection aims to distinguish factual and uncertain information, which is important in information extraction. The task of hedge detection contains two subtasks: identifying hedge cues and detecting their linguistic scopes. Hedge scope detection is dependent on syntactic and semantic information. Previous researches usually use lexical and syntactic information and ignore deep semantic in...
متن کاملExploiting Rich Syntactic Features for Hedge Detection and Scope Finding∗
Hedge detection and scope finding are increasingly important tasks in information extraction, especially in the biomedical natural language processing community. In this paper, a novel approach detecting hedge cues and their scopes by sequence labeling is explored. It should be emphasized that syntactic dependencies are systematically exploited and effectively integrated by a large-scale featur...
متن کامل